power iteration
LASER: Low-Rank Activation SVD for Efficient Recursion
Çakar, Ege, Raghu, Ketan Ali, Zheng, Lia
Recursive architectures such as Tiny Recursive Models (TRMs) perform implicit reasoning through iterative latent computation, yet the geometric structure of these reasoning trajectories remains poorly understood. We investigate the activation manifold of TRMs during recursive unrolling and find that activations occupy an effectively linear, low-dimensional subspace whose principal directions can be tracked dynamically with cheap power iterations. This suggests that weight-sharing concentrates iterative computation along a small number of dominant eigendirections, and we find that this concentration varies sharply across computational sites. We exploit this structure through LASER (Low-Rank Activation SVD for Efficient Recursion), a dynamic compression framework that maintains an evolving low-rank basis via matrix-free subspace tracking with a fidelity-triggered reset mechanism, achieving ${\sim}60\%$ activation memory savings with no statistically significant accuracy degradation. Our analysis raises questions about how recursive architectures allocate representational capacity during implicit reasoning, and whether this concentration can be exploited to improve the efficiency and stability of latent computation.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report (0.46)
- Overview (0.40)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
Appendix
The third entry varies under perturbation. Wecan compute the local indicator matrices atthis layer accordingly. We inherit the notations from the main text, and useIL to denote 13 theindicator matrixforlinearReLUoutputs. The key observation from this approach is that we can "merge" the weight matrices together for linearneurons(thefirstterminEq(19)).ThenwehavekW3D2LW2D1LW1k kW3kkW2kkW1k. Consider a neural network that maps inputx to output z = F(x), where z RN.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.27)
- North America > United States > California > Alameda County > Berkeley (0.14)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- (5 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)